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ABSTRACT 


.*  The  work  accomplished  is  represented  by  four  Tech  Reports  already 
issued  and  the  development  of  three  tests  of  goodness-of-fit  for 
censored  data  reported  herein.  All  the  Tech  Reports  are  submitted  for 

H 

publication.  Two  of  the  Tests  are  developed  using  a  result  due  to 
Moses  (J.  Amer.  Statist.  Assoc. 59 , (1964),64S-51)fbr uncensored  data  and 
its  modification  for  the  censored  data.  The  other  is  an  extension  of 
the  empty  cell  test  to  the  censored  case. 
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1.  Introduction. 

The  accomplishments  are  represented  by  the  following  Technical 
Reports  (listed  in  chronological  order)  written  and  issued  from  time  to 
time,  and  the  work  on  three  tests  of  goodness- of -fit  for  censored  data 
reported  herein  below: 

[1]  Korwar,  R.M.  (1980).  A  characterization  of  a  Polya-Eggenberger 
and  other  discrete  distributions  by  record  values. 

[2]  Korwar,  R.M.  (1981).  A  characterization  of  the  Waring  distribution. 

[3]  Korwar,  R.M.,  and  Naik,  D.N.  (1981).  Testing  for  equality  of 
means  with  additional  data  on  one  variable:  a  likelihood  ratio 
test  and  a  Monte  Carlo  study. 

[4]  Korwar,  R.M.  (1981).  On  characterizations  of  the  power-function 
and  discrete  uniform  distributions  through  a  model  of  over¬ 
reported  claims. 

2.  A  Brief  Description  of  the  Work  Reported  in  [11 -[4], 

In  [1]  above,  a  class  of  Polya-Eggenberger  distributions  is 
characterized  by  record  values.  The  Polya-Eggenberger  distribution  is 
one  of  the  truly  "contagious  distributions”  found  very  useful  in  applied 
work.  Specifically,  let  •  be  a  sequence  of  independent  •'ud 

identically  distributed  discrete  random  variables.  Define  the  Sf-q. 

{N(n)>  by  N(l)  =  1,  N(n)  =  min{j  |  j  >  N(n-l) ,  ^ 

n  *  2,3 .  Let  Rn  =  X^,.  Then  {R^}  is  the  sequence  of  record 

J  val oes.  By  convention  Rj  =  X^.  Assume  E(X^)  exists  and  is  finite, 
n  Here  characterization  of  a  Polya-Eggenberger  and  other  discrete 

□  j 

- distributions,  including  the  geometric  is  made  by  the  linearity  of 
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of  regression  of  R2  -  on  R1 . 


This  paper  is  submitted  to  Sankhya  for  publication.  An  abstract 
has  appeared  in  the  Bulletin  of  the  Institute  of  Mathematical  Statistics 
(IMS  Bulletin  10,  #2(1981}  64,  #8/t-33). 

In  [2]  above,  a  characterization  of  the  Waring  distribution  is  made 
by  the  identity  of  distributions.  The  Yule  distribution,  which  is 
sometimes  used  as  a  distribution  of  word  frequencies  in  applied  work, 
is  a  special  case  of  the  Waring  distribution.  It  is  characterized  by 
the  following  property:  For  a  positive  integer- valued  random  variable 
X,  P(X"r)  =  pr,  r  =  1,2,...,  and  with  a  finite  mean  y  define  two 
new  random  variables  Y  and  Z  by 

P(Y=r)  =  q  =  E  p,  +  ap  )/(y+a),r  =  0,1,... 
r  k=r+l  K  r 

P(Z=r)  =  q'r  =  (r+blp^ty+b),  r  =  1,2,... 

where  a  >  0  and  b  are  constants  with  b  -  a  +  1  >  0.  Then  Z  and 
Y  truncated  at  0  have  the  same  distribution  if  and  only  if  X  has 
a  Waring  distribution  of  the  form 

P(X-r)  =  (A-Oc^J/aM,  r  =  1,2,..., 


In  [3]  above,  a  likelihood  test  is  derived  for  testing  the 
equality  of  means  of  a  bivariate  normal  distribution  with  equal 
variances  when  additional  data  on  one  variable  are  available.  The 
situation  can  also  be  viewed  as  if  some  observations  on  one  variable 
are  missing.  A  Monte  Carlo  study  is  conducted  to  study  the  power  and 
level  of  significance  attained  in  an  attempt  at  comparing  several  tests 
available  in  the  literature  along  with  the  proposed  test.  As  a  result 
of  the  study  an  indication  is  made  of  the  preferred  test  for  each 
combination  of  the  correlation  coefficient  and  difference  of  means. 

This  was  submitted  to  the  Journal  of  American  Statistical  Association 
and  a  revision  is  underway.  The  revisioned  version  will  be  resubmitted 
to  the  above  journal  or  somewhere  else.  An  abstract  is  submitted  and 
will  appear  in  the  Bulletin  of  the  Institute  of  Mathematical  Statistics. 
This  research  is  a  natural  counterpart  to  Dahiya  and  Korwar  (1980) . 

In  [4]  above,  using  a  model  for  over- reported  claims  (such  as 
insurance  claims  for  fire  damage  to  property,  etc.)  some  characterizations 
of  useful  distributions  in  statistics  are  made.  Using  this  model  which 
assumes  overreporting  the  power-function  and  discrete  uniform  distri¬ 
butions  are  characterized  as  follows:  (1)  The  distribution  of 
observed  claims  suitably  truncated  on  the  right  coincides  with  the  true 
distribution  if  and  only  if  the  distribution  is  of  the  power- function 
form  and  (2)  a  variable  having  a  linear  regression  on  the  true  claims 
has  a  linear  regression,  with  suitable  slope  and  intercept,  on  the 
reported  claims  if  and  only  if  the  distribution  is  of  the  power- function 
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form.  Similar  results  are  obtained  for  the  discrete  uniform 
distribution. 


3.  TWo  Tests  of  Goodness-of-fit  for  Censored  Data  Based  on  a  Result 
of  Moses. 

Suppose  Yp . . . ,Y°n  is  a  sample  of  size  n  from  a  continuous 
distribution  G.  EUe  to  random  censoring  on  the  right  we  do  not 
observe  the  Y°^ 's  but 

(3.1)  Yi  =  minCY^.U.),  i  *  l,...,n 

where  U1»...,Un  are  independent  random  variables  (r.v.),  called 
censoring  r.v. 's,  with  a  continuous  distribution  function  H. 
Assume  that  yP  's  and  Lh's  are  mutually  independent.  We  also 
observe 


(3.2) 


X,  if  Y®  *  Ui 

0,  otherwise 


for  i  =  1, . . . ,n-  The  problem  is,  given  the  censored  data 


(3.3)  i“  1,  •  •  •  »n) 

to  test  whether  yP  's  could  have  come  from  a  specified  distribution 
F.  That  is  we  would  like  to  test 

(3.4)  HQ:  G(y)  •  F(y)  all  real  y 


against 


m 


(3.5) 


Hj:  G(y)*F(y)  for  sane  real  y. 


We  derive  two  tests  for  testing  Hg  by  making  use  of  Moses' (1964) 
one  sanple  limits  of  some  two-sample  rank  tests.  Let  X ^ , . . . , 3^  be  a 
sample  of  size  m  from  F  and  let  Y^,...,Yn  be  an  independent  sample  of 
size  n  from  G.  Both  the  distribution  functions  F  and  G  are 
assumed  unknown.  Then  Moses  showed  that  the  limit,  as  m  -*■  ®,  of 
Lehmann's  most  powerful  test  of 


against 


Hq:  F(x)  =  G(x)  all  real  x 


H-, :  G(x)  *  [F(x)]  ,  k>l 


is  to  reject  H  for  large  values  of 


(3.6) 


Z  lnF(Y-)  . 
jal  J 


Note  that  now  F  becomes  known  since  m  <*>  and  we  have  an  infinite 
sanple  from  F.  Similarly  he  shows  the  limit,  as  m  -*■  <*>,  of  the 
Wilcoxon  two- sanple  test  of 

HQ:  F(x)  -  G(x)  all  real  x 

against 

HL:  F(x)  >  G(x)  all  real  x 
is  to  reject  Hq  for  large  values  of 
(3- “7)  £  FfYj). 
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Now  back  to  Hq(3.4).  We  cannot  directly  use  (3.7)  with 

0 

censored  data.  Because  of  censoring  some  of  the  F(Y^)  cannot  be 
computed.  We  replace  (3.7)  by  its  conditional  expectation  given  G  »  F 
and  the  data  (3.3).  Thus  our  test  statistic  will  be 

Tn  =  E(IF(Y®)|F,Yi,6.,i«l,...,n). 


E(F(Yj  )  |  F,Yj =  1)  -  FCYj)  , 


E(F(Y.°)|F,Y.,6.  -  0)  -  f  F(y)dF(y)/  f  dF(y) 
J  J  J  )v  'Y* 


=  1  U+F(Y.)>  . 
7  J 


Thus,  we  take  as  our  test  statistic 


(3.8) 


r '  -  Z  6.  F(Y.)  ♦  \  ?  (1-6.){1+F(Y,)} 
n  j=l  J  3  ^j=l  •’  J 


where 


l  V., 
j=l  3 


(3.9)  Vj  =  7  <(1+fij)F(Yj)  ♦ 


In  the  following  theorem  we  prove  the  asymptotic  normality  of  Tn . 


Theoran  3.1:  The  statistic  Tr  is  asymptotically  normal  with 
asymptotic  mean  and  variance  n^  and  no  o^  where 
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(3.10)  2\i  =  2E(VX) 


=  1  -  J^G(u)dH(u)  + F(y)dG(y)  ♦  JJjJ  F(y)dG(u)|dH(u)  , 


(3.11)  c  =  Var(Vj) 


(3.12)  4E(VX2  )  =  1  -J  G(u)dH(u)  ♦  jV(y)dGCy)  +  3^  jjV(y)dG(y)}dH(u) 

C  {J,*(y)dG(y)}M(u)  * 


Proof :  The  theorem  follows  from  the  central  limit  theorem  and  the 
fact  that  V. ' s  are  independent  and  identically  distributed  bounded 
random  variables  with  common  mean  and  variance  given  by  (3.10- (3.12) . 

Note  that  since  G  =  F  under  HQ  the  asymptotic  null  mean  and 
variance  Oq  are  given  by  (3.10)- (3.12)  where  we  replace  G  by  F. 

The  censoring  distribution  H  appearing  in  (3. 10) -(3. 12)  is  generally 
unknown  and  must  be  estimated  from  the  data. 

The  estimation  of  R(u)  =  1  -  H(u)  from  the  data  by  the  method  of 
Kaplan  and  Meier  (1958)  is  completely  analogous  to  the  estimation  of 
G(y)  =  1  -  G(y)  from  the  data  and  using  the  same  method,  except  for 
the  fact  that  (1-6^) * s  now  play  the  role  of  6j  's  before.  Let 
Y(i)  < . .  •<Y(n)  be  the  ordered  Y^.'s  and  let  Cj  »  1  - 


where 


5  [  j  ]  is  the  5  that  8oes  with  Y[ j  ] » j  *  1 » •  •  •  »n  •  Then  the  Kaplan- 
Meier  (K-L)  estimator  FT(u)  of  R(u)  is  given  by 

a  k-1  e • 

(3.13)  H(u)  =  ((n-j)/(n-j+l)}  3,  ueOTg^.Y*]  , 

and  H(u)  =  0  for  u  >  Y^.  Thus  consistent  estimators  and 

a  2  2  2 

00  of  Pq  and  Op  respectively  can  be  obtained  from  Vq  and  Oq 

a  * 

by  replacing  H  by  H(3.13)  appearing  in  their  expressions.  The 

2 

consistency  of  the  resulting  estimators  of  pQ  and  oQ  follows  from 
the  weak  convergence  of  the  K-L  estimator.  Combining  Theorem  3.1  and 
the  above  we  have 

(3.14)  Zn  =  (Tn-nfi0)//n§  Q  i  N(0,1) ,  n  -  «  . 

Finally,  to  test  HQ  against  at  level  a,  we  reject  Hq  if 
^Zn^  >  Za/2  and  accept  otherwise,  where  ^a/2  (1“®/2)100  *th 

percantile  for  the  standard  normal  distribution. 

A  similar  test  can  be  constructed  using  (3.6)  and  the  same 
technique  of  replacing  the  test  statistic  for  the  uncensored  case  by 
its  conditional  expectation  given  G  =  F  and  the  data  for  the  censored 
data  case.  The  resulting  test  statistic  will  have  asymptotic 
normality  since  the  test  statistic  again  is  going  to  be  a  sun  of 
independent  and  identically  distributed  random  variables. 

Hollander  and  Proschan  (1979)  use  the  same  idea  due  to  Moses  and 
come  up  with  a  test  different  from  our  tests. 


In  this  section  we  derive  an  empty  cell  test  for  censored  data. 
We  use  the  notation  developed  in  section  3.  Using  the  hypothesized 
continuous  distribution  function  F,  choose  points 


x0  =  -co<xi<  •••  <XN-1  <  *N  =  °°  SUch  that  F(xk)  ‘  F(xk-1)  =  1/N» 
for  k  =  1,...,N,  where  N  is  a  specified  positive  integer.  In  the 

uncensored  case  the  test  statistic  used  is 


(4.1)  Ug(n,N)  =  #  of  intervals  (x^pX^]  containing  no 


observation  Yj  ' s 

The  test  is  to  reject  Hq(3.4)  if  Pq  s  C,  where  C  is  chosen  to 
have  a  a  level  test.  The  empty  cell  test  is  attractive  because  of 
its  simplicity.  An  excellent  reference  on  the  subject  is  the  recent 
book  by  Kolchin  et  al  (1978) . 

Because  of  right  censoring  not  all  the  yP  's  are  observed. 

Hence  Pq(4.1)  cannot  in  general  be  computed.  We  replace  Pq  by  its 
conditional  expectation  given  G  *  F  and  the  censored  data  (3.3) 
and  use  the  resulting  random  variable  as  the  test  statistic.  Let 


(4.2)  Pd(n,N)  ■  #  of 


empty  cells. 


Ci  =  (xk.-l»  xk.J,  1  "  i 


th  apparent  empty  cell. 


Then,  it  can  be  shown  that 
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of  y  q  (n>N)  -  E(p0(n,N) |Yj ,6j »j  -  l,...,n} 


v  j  n 

=  Ed  n 

i=l  j=l 


minjFCx^ ^^FCYj)  -min|F(yj),F(xk  )' 


F(Yj) 


1-5, 


The  distribution  theory,  both  small  and  large  sample,  U*Q  is  now 
being  derived  and  will  be  reported  Jater. 
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